High-Dimensional OLAP: A Minimal Cubing Approach

نویسندگان

  • Xiaolei Li
  • Jiawei Han
  • Hector Gonzalez
چکیده

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications like bioinformatics, statistics, and text processing that are characterized by high dimensionality, e.g., over 100 dimensions, and moderate size, e.g., around 10 tuples. No feasible data cube can be constructed with such data sets. In this paper we will address the problem of developing an efficient algorithm to perform OLAP on such data sets. Experience tells us that although data analysis tasks may involve a high dimensional space, most OLAP operations are performed only on a small number of dimensions at a time. Based on this observation, we propose a novel method that computes a thin layer of the data cube together with associated value-list indices. This layer, while being manageable in size, will be capable of supporting flexible and fast OLAP operations in the original high dimensional space. Through experiments we will show that the method has I/O costs that scale nicely with dimensionality. Furthermore, the costs are comparable to that of accessing an existing data cube when full materialization is possible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-dimensional Hierarchical Olap : a Prefix– Index Hierarchical Cubing Approach

The pre-computation of data cubes is critical for improving the response time of OLAP(online analytical processing) systems and accelerating data mining tasks in large data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional OLAP, it might not be practical to build all th...

متن کامل

OLAP on Structurally Significant Data in Graphs

Summarized data analysis of graphs using OLAP (Online Analytical Processing) is very popular these days. However due to high dimensionality and large size, it is not easy to decide which data should be aggregated for OLAP analysis. Though iceberg cubing is useful, but it is unaware of the significance of dimensional values with respect to the structure of the graph. In this paper, we propose a ...

متن کامل

Multi-Dimensional Analysis of Data Streams Using Stream Cubes

Large volumes of dynamic stream data pose great challenges to its analysis. Besides its dynamic and transient behavior, stream data has another important characteristic: multi-dimensionality. Much of stream data resides at a multidimensional space and at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes in some combination of dimensio...

متن کامل

CUBIST: A New Approach to Speeding Up OLAP Queries

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. Specifically, we introduce a new data structure, called Statistics Tree (ST), together with an algorithm, called CubiST (Cubing with Statistics Trees), for evaluating OLAP queries on top of a r...

متن کامل

CUBIST: A NEW APPROACH TO IMPROVING THE PERFORMANCE OF AD-HOC CUBE QUERIES By LIXIN FU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

suggestions. Last but not least, I extend my utmost gratitude to my wife Jia Liu, my mother-in-law Sun Xinhua, and my son Andrew Fu for their enduring support. We provide a new approach to speeding up the evaluation of cube queries, an important class of OLAP queries which return aggregated values rather than sets of tuples. Our new algorithm called CubiST (Cubing with Statistics Trees) represe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004